Will Pyramids Built of Nuggets Topple Over?
نویسندگان
چکیده
The present methodology for evaluating complex questions at TREC analyzes answers in terms of facts called “nuggets”. The official F-score metric represents the harmonic mean between recall and precision at the nugget level. There is an implicit assumption that some facts are more important than others, which is implemented in a binary split between “vital” and “okay” nuggets. This distinction holds important implications for the TREC scoring model—essentially, systems only receive credit for retrieving vital nuggets—and is a source of evaluation instability. The upshot is that for many questions in the TREC testsets, the median score across all submitted runs is zero. In this work, we introduce a scoring model based on judgments from multiple assessors that captures a more refined notion of nugget importance. We demonstrate on TREC 2003, 2004, and 2005 data that our “nugget pyramids” address many shortcomings of the present methodology, while introducing only minimal additional overhead on the evaluation flow.
منابع مشابه
Different Structures for Evaluating Answers to Complex Questions: Pyramids Won't Topple, and Neither Will Human Assessors
The idea of “nugget pyramids” has recently been introduced as a refinement to the nugget-based methodology used to evaluate answers to complex questions in the TREC QA tracks. This paper examines data from the 2006 evaluation, the first large-scale deployment of the nugget pyramids scheme. We show that this method of combining judgments of nugget importance from multiple assessors increases the...
متن کاملChina’s Maritime Interest and the Great Game at Seas
Linking China’s interest in the maritime waters arises from the geo-strategic importance of Sea Lines of Communications (SLOCs) vital to the oil supply of the country. China is building strategic relationships and developing a naval capability to establish a forward presence along the SLOCs that connect China to the Middle East and to Africa. The entire stretch includes South China Sea, Indian ...
متن کاملMonitoring the Built-up Area Transformation Using Urban Index and Normalized Difference Built-up Index Analysis
Makassar is one of the metropolitan cities located in Indonesia which recently experiences massive an increased construction because of population growth. Mapping the spatial distribution and development of the built-up region is the best method that can use as an indicator to set the urban planning policy. The purpose of this study is to identify changes in land use and density in Makassar Cit...
متن کاملnD generalized map pyramids: Definition, representations and basic operations
Graph pyramids are often used for representing irregular image pyramids. For the 2D case, combinatorial pyramids have been recently defined in order to explicitly represent more topological information than graph pyramids. The main contribution of this work is the definition of pyramids of n-dimensional generalized maps. This extends the previous works to any dimension, and generalizes them in ...
متن کاملFuture study of Description System Architecture Approaches with Emphasis on Strategic Management
Systems Architecture is a generic discipline to handle objects (existing or to be created) called systems, in a way that supports reasoning about the structural properties of these objects. Systems Architecture is a response to the conceptual and practical difficulties of the description and the design of complex systems. Systems Architecture is a generic discipline to handle objects (existin...
متن کامل